Application of HPCN to Direct Numerical Simulation of Turbulent Flow

نویسندگان

  • R. W. C. P. Verstappen
  • A. E. P. Veldman
  • G. Matthijs van Waveren
چکیده

This poster shows how HPCN can be used as a pathnding tool for turbulence research. The parallelization of direct numerical simulation of turbulent ow using the data-parallel model and Fortran 95 constructs is treated, both on a shared memory and a distributed memory computer. Introduction Computer simulation has become a major tool for studying turbulent ow in many technological and environmental applications. It is not too much to say that computer simulation has changed the way in which cars, aircraft, or ships are designed. The turbulent ow in these applications typically involves a large range of dynamically signi cant scales of motion. In current production codes only a few large scales of motion are resolved. All other scales are modelled. In many applications, the required simulation accuracy cannot be reached with the existing turbulence models; see [1], e.g. High-performance computing provides a solution for this: to model less and to resolve more. In this poster we consider the ultimate approach. All scales are resolved, none are modeled. This approach called Direct Numerical Simulation (DNS) of turbulence is not yet feasible for ows at Reynolds numbers of Re = 10 or higher, since these ows possess (far) too many scales of motion. Low-end applications (Re = 10), however, can be attacked with DNS. Table ?? gives an overview of the computational requirements for DNS, in terms of processing power and memory size. The numbers in the column `internal ow' are based on a DNS that has actually been performed [2]. The numbers in the other columns are based on estimates about the increase of the number of scales of motion with the Reynolds number. The numerical algorithms for the DNS of the wing and the car are taken to be as fast as today's algorithms for incompressible turbulent ows in rectangular geometries. Table 1. Required memory and processing power for DNS application internal ow wing car Reynolds number 10 10 10 number of grid points 10 3:10 10 oating-point computations 3:10 3:10 3:10 CPU performance (300 hours run) 300 M op/s 300 G op/s 300 T op/s memory capacity 1 Gbyte 300 Gbyte 100 Tbyte data transfer rate to main memory 100 Gbyte/s 100 Tbyte/s Thus HPCN is highly necessary as a pathnding tool in this important and exciting application area. About six orders of magnitude have to be bridged to perform a DNS of a fully developed turbulent ow around a car, for example. Assuming that both computer hardware and computational algorithms will continue to progress at the rate that they have got ahead during the past three decades both have become one and a half order of magnitude faster per decade it will take roughly two decades to bridge the lacking orders of magnitude. It may be noted that the memory requirements can be reduced signi cantly by using a domain-swapping technique. The last line in Table ?? shows the rate of data transfer from background to main memory and vice versa needed to swap two subdomains of the ow in a time equal to the time needed to update one subdomain, i.e. the rate to swap without slowing-down. Computational procedure The incompressible Navier-Stokes equations are discretized using a nite-volume method, where the velocities and pressures are de ned on a staggered grid. The pressure term and the incompressibility constraint are integrated implicitly in time; the convective and di usive terms are treated explicitly. The computation of one time step can be divided into two sub-steps. First, an auxiliary velocity is computed by integrating the convective and di usive terms of the momentum equations over one time step. After that, the auxiliary velocity is corrected by adding a pressure gradient to it, where the pressure is to be computed from a Poisson equation. Parallelization The parallelization of direct numerical simulation of turbulent ow using the data-parallel model and Fortran 95 constructs is treated. We will rst consider the data-parallel implementation on a shared memory system, namely a CRAY J932. We will focus on a Fast Fourier/Conjugate Gradient method for solving the discrete Poisson equation for the pressure. Then we will discuss the data-parallel implementation on a TMC CM-5, a distributed memory system. For this machine data-parallel implementations of both the Poisson solver and the integration of the convection/di usion equation are considered. Poisson on a shared memory machine In this section we make explicitly use of the fact that the turbulent ow under consideration is statistically homogeneous in (at least) one direction. The ow can then be described using periodic boundary conditions in that direction and the Poisson equation for the pressure can be solved using a combination of a Fast Fourier Transform method in the homogeneous direction and an Incomplete Choleski Conjugate Gradient method in the spectral space. The FFT can be computed in parallel by treating the unknowns simultaneously in the non-periodical directions, while the ICCG can be computed in parallel by treating the unknowns simultaneously in the periodical direction. The FFT/ICCG Poisson solver runs at about 80 M op/s on one vectorprocessor of the CRAY J932, and scales up perfectly. Thus 2.5 G op/s can be achieved on 32 processors. Poisson on a distributed memory machine The FFT/ICCG approach which works very well on the CRAY J932, performs poorly on the CM-5. The reason for this is twofold. Firstly, the FFT and the ICCG part do not have the same parallel directions. Consequently, on distributed memory machine like the CM5, this code will generate an enormous tra c of data between the local memories, either in the FFT part or in the ICCG part. The second cause for the disappointing performance of the FFT/ICCG Poisson solver on the CM-5 is the speed at which ICCG iterations run. These iterations run so slow that the Incomplete Choleski preconditioner does not pay o . For these two reasons, we use a plain Conjugate Gradient method for the Poisson equation on the CM-5. CG is from a numerical point of view not the fastest way of solving a Poisson equation. Yet, it runs at about 25% of the peak of the CM-5, and not a single preconditioned method can touch that. Consequently, these methods cannot oust CG on the CM-5 in our application. Performance prediction for the convection-di usion equation The ratio between the time needed for communications and the time taken by the computations can be used to predict the performance. We count the number of shifts and oatingpoint operations needed to integrate a convective-di usion equation. Shifting is an intrinsic function of Fortran 95. We consider the 2-dimensional case in this poster article. The evaluation of the di usive term requires 4 shifts. On a non-uniform grid, all 5 elements of the stencil have to be multiplied by di erent constants and have to be summed together, which costs 9 oating-point operations. The evaluation of the convective term requires 7 shifts and 28 ops. Thus, 11 shifts and 38 ops are required for the evaluation of one discrete convection/di usion equation. >From this, the communication costs can be estimated by counting the number of data-elements that have to be moved from one local memory to another for one shift. We consider the operation CSHIFT(u,DIM=1,SHIFT=-1) on a target machine with p local memories. In two dimensions, the array u it is de ned as u(1:N,1:N). We assume that u can be divided into p subarrays of size (N/p1/2)2, and that all elements of one subarray are stored in one local memory. Then, the absolute number of data-motions is equal to (N/p1/2)*p. Given the architecture, and given the communication speed of a certain parallel computer, we can estimate the time needed for the shifts and an upperbound for the speed in M op/s. We consider a 16 node CM-5. Each node has 4 vector units, each with its own local memory, leading to a total of 64 local memories. For N =1000 the total number of bytes to be moved for the evaluation of one component of the right-hand side of the convection-di usion equation can be estimated by N times p1/2 times the no. of shifts per data-element times the no. of bytes per data-element = 10 8 11 8 = 704000. On a 16 node CM-5, this data-motion takes approximately 0:04 seconds. Let us assume that the ops are fully overlapped with the communications. Then, 38 10 oating point operations would take 0.04 seconds, and the time-integration of the convection-di usion equation would run at 950 M op/s (46% of peak). This prediction shows that the communication slows down the performance. In reality the time-integration of the convection-di usion equation runs at approx. 15% of peak and the realistic ratio between the communication time and computation time equals 1 to 2. Conclusion This poster shows that the data-parallel solution of the unsteady Navier-Stokes equation can be implemented, yielding a reasonable performance, both on a shared memory CRAY and a distributed memory CM5. This makes direct numerical simulation of turbulence at higher Reynolds numbers possible. Acknowledgements The Stichting Nationale Computerfaciliteiten (National Computing Facilities Foundation, NCF) with nancial support from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Netherlands Organization for Scienti c Research, NWO) is gratefully acknowledged for the use of supercomputer facilities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct Numerical Simulation of the Wake Flow Behind a Cylinder Using Random Vortex Method in Medium to High Reynolds Numbers

Direct numerical simulation of turbulent flow behind a cylinder, wake flow, using the random vortex method for an incompressible fluid in two dimensions is presented. In the random vortex method, the primary variable is vorticity of the flow field. After generation on the cylinder wall, it is followed in two fractional time step in a Lagrangian system of coordinates, namely convection and diffu...

متن کامل

Numerical Simulation of Separation Bubble on Elliptic Cylinders Using Three-equation k-? Turbulence Model

Occurrence of laminar separation bubbles on solid walls of an elliptic cylinder has been simulated using a recently developed transitional model for boundary layer flows. Computational method is based on the solution of the Reynolds averaged Navier-Stokes (RANS) equations and the eddy-viscosity concept. Transitional model tries to simulate streamwise fluctuations, induced by freestream turbulen...

متن کامل

Numerical Simulation of Turbulent Subsonic Compressible Flow through Rectangular Microchannel

In this study, turbulent compressible gas flow in a rectangular micro-channel is numerically investigated. The gas flow assumed to be in the subsonic regime up to Mach number about 0.7. Five low and high Reynolds number RANS turbulence models are used for modeling the turbulent flow. Two types of mesh are generated depending on the employed turbulence model. The computations are performed for R...

متن کامل

Numerical Simulation of Free Surface in the Case of Plane Turbulent Wall Jets in Shallow Tailwater

Wall-jet flow is an important flow field in hydraulic engineering, and its applications include flow from the bottom outlet of dams and sluice gates. In this paper, the plane turbulent wall jet in shallow tailwater is simulated by solving the Reynolds Averaged Navier-Stokes equations using the standard  turbulence closure model. This study aims to explore the ability of a time splitting method ...

متن کامل

Numerical simulation of turbulent flow around the dtmb4119 propeller in open water conditions

In this study, ANSYS-FLUENT packages are employed to simulate the turbulent flow around DTMB4119 propeller in open water conditions. In order to form a mesh, the multiple reference frame (MRF) methodology is used. The results are compared with the experimental results and a good conformity is obtained, which endorses numerical simulation. Furthermore, the  turbulence model is used, which is sup...

متن کامل

Experimental study and numerical simulation of three dimensional two phase impinging jet flow using anisotropic turbulence model

Hydrodynamic of a turbulent impinging jet on a flat plate has been studied experimentally and numerically. Experiments were conducted for the Reynolds number range of 72000 to 102000 and a fixed jet-to-plate dimensionless distance of H/d=3.5. Based on the experimental setup, a multi-phase numerical model was simulated to predict flow properties of impinging jets using two turbulent models. Mesh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997